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performance both asymptotically and empirically and that an opti- 
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1. Introduction. Multivariate failure time data are encountered in many 
biomedical studies when related subjects are at risk of a common event or a 
study subject is at risk of different types of events or recurrence of the same 
event. Some examples are: epidemiological cohort studies in which the ages of 
disease occurrence are recorded for members of families; animal experiments 
where treatments are applied to samples of litter mates; clinical trials in 
which individual study subjects are followed for the occurrence of multiple 
events; intervention trials involving group randomization. A common feature 
of the data in these examples is that the failure times might be correlated. 
For example, in clinical trials where the patients are followed for repeated 
recurrent events, the times between recurrences for a given patient may be 
correlated. 

When there is at most one event for each subject and these subjects are 
mutually independent, the Cox [11] proportional hazards model has com- 
monly been used to assess the effects of the covariates on failure times. 
For multivariate failure time data, research efforts have been concentrated 
on marginal hazards models and frailty models. Related literature includes, 
but is not limited to, Wei, Lin and Weissfeld [28], Lin [20], Cai and Prentice 
[3, 4] and Spiekerman and Lin [25] for the marginal models and Vaupel, 
Manton and Stallard [27], Clayton and Cuzick [10], Anderson and Louis [2], 
Oakes and Jeong [22] and Fan and Li [16] for frailty models. The statistical 
methods developed for dealing with failure time data typically assume that 
the covariate effects on the logarithm of the hazard function are linear and 
that the regression coefficients are constants. These assumptions, however, 
are primarily made for their mathematical convenience. True associations in 
practical studies are usually more complex than a simple linear model can 
capture. 

An important extension of the standard regression model with constant 
coefficient is the varying-coefficient model. The varying-coefficient model ad- 
dresses an issue frequently encountered by investigators in practical studies. 
For example, the effect of an exposure variable on the hazard function may 
change with the level of a confounding covariate. This is traditionally mod- 
eled by including an interaction term in the model. Such an approach is a 
simplification of the true underlying association since a cross product of the 
exposure and the confounding variable only allows the effect of the exposure 
to change linearly with the confounding variable. In many studies, however, 
investigators express the belief that the rate of change is not linear and seek 
to examine how each level of the exposure interacts with the confounding 
variables. For example, in a study of cancer risk in uranium miners [23], 
radon exposure was measured for over 23,000 underground miners in the 
Czech Republic during 1949-1975. The mining industry's workplace safety 
measures which affect the inhalation of radon gas, such as ventilation con- 
ditions, have changed over the last fifty years. Therefore, the effect of a 
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fixed amount of exposure in the 1950s sliould not be treated tlie same as 
in the 1970s. How to handle this issue is of current active research inter- 
est in epidemiology. This leads to a general varying-coefficient model where 
the coefficient for radon exposure is a function of the calendar year and 
where this function can be nonlinear over time. Parametric models for the 
varying-coefficient functions are most efficient if the underlying functions are 
correctly specified. However, misspecification may cause serious bias and the 
model constraints may distort the trend in local areas. Nonparametric mod- 
eling is appealing in these situations. 

Varying-coefficient models have been studied in many non-failure time 
data settings such as multidimensional nonparametric regression, general- 
ized linear models, analysis of longitudinal data and nonlinear time series. 
They are particularly appealing in longitudinal studies because they allow 
one to explore how the effects of the covariates change over time. Related 
literature includes Hastie and Tibshirani [17], Carroll, Ruppert and Welsh 
[9] and Cai, Fan and Li [6]. For univariate survival time, the time- varying 
effect has been carefully studied by Murphy [21], Cai and Sun [7] and Tian, 
Zuker and Wei [26]. Applications of varying-coefficient models to survival 
analysis, particularly in the context of multivariate failure time, remain to 
be studied. New technical challenges arise in dealing with within-cluster de- 
pendence and the varying effects of an exposure variable. The local pseudo- 
partial likelihood in our setting is more sophisticated than that based on the 
time- varying model. In fact, the latter is no longer a proportional hazards 
model. 

In this paper, we study the marginal hazards model with varying coeffi- 
cients for multivariate failure time data. The rest of this paper is organized as 
follows. In Section 2, we formulate the varying-coefficient model and propose 
local pseudo-partial likelihood procedures for coefficient functions. We also 
establish asymptotic properties and propose a variance estimator. Further, 
we consider a computationally efficient one-step procedure and show that 
it is asymptotically equivalent to the local pseudo-partial likelihood estima- 
tor. In Section 3, we propose a weighted average approach to estimate the 
coefficient functions. We evaluate the proposed procedures through simula- 
tion studies and illustrate the proposed approach via an application to the 
Busselton Population Health Surveys data set in Section 4. Final remarks 
are made in Section 5. Proofs of theoretical results are given in Section 6. 

2. Marginal hazards model with varying coefficients. Suppose that there 
is a random sample of n clusters from an underlying population and that 
there are J members in each cluster. Let i indicate cluster and denote 
the jih member in the ith cluster. Let Tij denote the failure time, Cij the 
censoring time and Xij = mm{Tij , Cij) the observed time for member 
(i = 1, . . . , re, J = 1, . . . , J). Let Aij be an indicator which equals 1 if Xij is 
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a failure time and is otherwise. Varying cluster sizes can be accommo- 
dated by defining Tij = Cij = 0. Let J^ij{t) represent the failure, censoring 
and covariate information up to time t for member as well as the co- 
variate information for the other members in the ith cluster up to time t. 
The marginal hazard function is defined as Xij {t; J-ij (t)) =limfi^Q j^V [Tij < 
t + h\Tij > t,Tij{t)]. The observed data structure is {Xij,Aij,Zij{t),Vij{t)} 
for i = 1, . . . , n, where 'Zij{t) = (Zjji(i), . . . , Zijp{t))'^ and Vij{t) are two types 
of covariates, with V being an exposure variable of interest. We assume that 
the censoring times are independent of the failure times conditional on the 
covariates and that the observation period is [0,r], where r is a constant 
denoting the time for end of the study. 

To explore how the effect of the exposure variable Z changes with different 
levels of a covariate variable V, we consider the varying-coefficient model 



where Aoj(-) is an unspecified baseline hazard function pertaining to the 
jth member of each response vector, /?(•) is the regression coefficient vector 
that may be a function of the covariate Vij, g{-) is a nonlinear effect of Vij 
and both and g{-) are unspecified, continuously differentiable functions. 
Let Nij{t) = I{Xij < t,Aij = 1) denote the counting process corresponding 
to Tij and let Yij{t) = I{Xij > t) denote the 'at risk' indicator process. Set 
Mij{t) = Nij{t) - £Yij{s)Xij{s)ds. Note that Mij{t) is a martingale with 
respect to the marginal filtration Tt,ij = (^{Xij{s~),Yij{s),Zij{s), < s < 
t} as well as the union u-field UiLi -^t.^i = Yij{s), Zij{s), < 

s <t, i = 1, 2, . . . , n}. However, Mij{t) (i = 1, 2, . . . , n, j = 1, 2, . . . , J) is no 
longer a martingale with respect to the entire union a-field IJILi U/=i ^t,ij = 
a{Nij{s~),Yij{s), Zij{s),0 < s < t,i = 1,2, ... ,n,j = 1,2, J} because the 
observations within a cluster might be mutually dependent. 

For ease of presentation, we drop the dependence of covariates on the 
time Xij with the understanding that the methods and proofs in this pa- 
per are applicable to external time-dependent covariates [18]. If all of the 
observations are independent, then the partial likelihood for model (1) is 



where TZj^t) = {i : Xij > t} denotes the set of individuals at risk just prior to 
time t. Since the observations within a cluster are not independent, we refer 
to (2) as the pseudo-partial likelihood. Wei, Lin and Weissfeld [28] considered 
the parametric counterpart for (2). 



(1) 



X^jit■,J^^j) = Xojit)exp{P{Vijit)fZ,j{t)+giV^jit))} 
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2.1. Local pseudo-partial likelihood estimation. If the unknown functions 
(3{-) and g{-) are parameterized, the parameters can be estimated by max- 
imizing (2). For our nonparametric estimation, the forms of the unknown 
functions are not available. Directly solving the pseudo-partial likelihood (2) 
for the unknown functions /3(-) and g{-) is hardly possible due to the infinite 
dimension of the unknown parameters. We choose to use the local polyno- 
mial method for our nonlinear modeling for the following reasons. First, it 
is relatively easy to program because existing software on parametric fitting 
can be modified, via the introduction of a weighting scheme, to deal with 
local parametric problems. Second, the sampling properties of local polyno- 
mial fitting can be derived and efficient semiparametric estimators can be 
constructed. 

Assume that all functions in the components of (3{-) and g{-) are smooth 
so that they admit Taylor expansions: for each given v and u, where u is 
close to V, 

,3^ (3{u) « (3{v) + (3'{v){u -v) = d + rj{u- v), 

g{u) K, g{v) + g'{v){u - u) = a + 7(m - u), 

where /3'(m) = dj3{u)/du. Substituting these local models into (2), we obtain 
the following logarithm of the local pseudo-partial likelihood: 

J n 

j=i i=i 

X Aij|(5^Zij + ri'^ZijiVij -v) + -f{Vij - v) 
(4) -log!" J2 exp{d''Zij + rj''Zi,{Vij-v) 

Here Kh(-) = K{-/h)/h, K{-) is a probability density called a kernel function, 
and h represents the size of the local neighborhood called a bandwidth. The 
kernel weight is introduced to reflect the fact that the local model (3) is only 
applied to the data around v. 

Using counting process notation and letting X*^- = {Zfj, Zjj{Vij — v),Vij — 
v)'^ and ^ = {8^ , t]^ , 7)^, the local pseudo-partial likelihood function (4) can 
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be expressed as n • oo), where 



J n 



j=l i=l -^0 



(5) 



KhiVij 



dNijiw). 



Maximizing i(j,S,rf) in (4) is equivalent to maximizing in{$,T) in (5). For 
a technical reason, following work in the literature, we maximize (5) for a 
given finite r. 



Let ${v) = {j{v) 



jTiiv) ) be the maximizer of (5). Then (3{v) 



d{v) is a local linear estimator for the coefficient function (3{-) at the point v. 
Similarly, an estimator of g'{-) at the point v is simply the local slope j{v), 
that is, g'{v) ='j{v). The curve g{-) can be estimated by integration of the 
function g'{v). Following Hastie and Tibshirani [17], the integration can be 
approximated by using the trapezoidal rule. The local pseudo-partial likeli- 
hood estimator in (4) is particularly easy to compute. It can be implemented 
by using existing software such as SAS or S-PLUS with the Cox regression 
procedure. The only difference is that one needs to incorporate the kernel 
weights in the Cox regression and must repeatedly apply the procedure at 
a grid of points in the range of the variable V. 



2.2. Assumptions and notation. To express explicitly asymptotic bias 
and asymptotic variance of the estimator, we introduce some necessary as- 
sumptions and notation. Let Hi = J x'^Kix) dx and Vi = J x^K^(x) dx for i = 
0, 1, 2. Denote P{w,z,v) = P{X > w\7j = ■z,V = v) and p{w,z,v) = P{w,z,v) x 
v)'^z + gQ{v)}. For A; = 0, 1, 2, define ajkiw, v) = fj{v)E{p{w, Zj,v) x 
= v}, where /j(-) is the density of Vj, and Z®*^ = 1, Z and ZZ^ 
0, 1 and 2, respectively. Let s.jk{v) = Jq a.jk{w,v) dw and a.k{v) = 
J2j=i^jk{'v)- We will drop the dependence of a.k{w,v), a.jf:{w,v), SLjkiv) and 



exp{/3oi 
for k = 



Sik{v) on V when there is no ambiguity. Finally, let 



r = rw 




a,-2 



{w)a.ji{w)'^ a.jQ{w) ^XQj{w)dw 



and 



Q 



(ao 



ai 



a^ a2ai^ 



where Qi = a2 



T -1 

aiafag . 
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Let II • II denote the L2-norm and || • ||<j, be the sup-norm of a function or 
process on a set ^. The support of the random variable V is denoted by V. 
For a compact subset $y of V and some e > 0, we define the neighborhood 
set of ^v,e as ^v,e = {u ■ inf^g$^ \u — v\ < e}. For A; = 0, 1, 2, let 

Sjkiw,C,v) = fj{v) J E[P{w,Z,,v)A{yXj,w)R.fiy,w)\Vj=v]K{y)dy, 

where Kj{y,w) = {Zj{w),Zj{w)y,yy and A{y,Cj,w) = exp{C^Rj(y, -w) + 

^^K,{0,w)}, where ^o(-) = iPli-) , P'oi'f , g'oi-)V ■ 

The following conditions are needed in the proofs of the main results: 

(i) The kernel function K{-) > is a bounded, symmetric function with 
compact support. 

(ii) The functions /3(-) and g{-) have continuous third derivatives around 
the point v. 

(iii) fj{-) is continuous at the point v. 

(iv) The conditional probability P{w, Zj(w), •) is equicontinuous at v and 
Zj[w) is continuous about w for each j. 

(v) (a) nh/ logn oo and nh^ is bounded; (b) /J" Xoj{t) dt < oo for each 
iG{l,2,...,J}. 

(vi) Sjr{t, 6, v), j = 1,2, . . . , J, r = 0, 1, 2, is bounded away from on the 
product space [0,r] x C x ^v,e, that is, 

inf inf inf s,>(t, 0, f) > 0, 

ie[0,r] 6»eC v(^<l>v,e 

where 6 = {0^ ,g) and 

sup sup^||Zj(t)f exp(/3^Zj(t) + c/) < oo 
te[o,T] 0ec 

for each j G {1, 2, . . . , J}. Meanwhile, Sjo(*) ^)) J = !> 2, . . . , J, are continu- 
ous functions for {t,0,v) G [0,r] x C x ^v,e-, uniformly in t G [0,t], and 

d 

Sji{t,e,v) = -Q^sjo{t,e,v) 



and 



92 

Sj2{t,0,v) = -^Sjo{t,e,v). 



(vii) (Asymptotic variance) The matrix 



(w) 



dAoj{w) 
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is positive definite for any v G ^v,e and the matrix 



is nonsingular at v ^ ^v,e- 

(viii) Tlie conditional probability P{u,Zij{u),w) is equicontinuous in the 
arguments {u,w) on [0,r] x (^w,e- 

(ix) The compact set C W has the following property: iniu^^we fji'^) > 
for each j and some e > and ||/j||(i>y < oo. 

(x) The covariate process Zj(t) has a continuous sample path in a subset 
Z of the continuous function space and |Zjjfc(0)| + /J" \dZijk{t)\ < a.s. for 
all fc and some constant <oo. 

The above conditions will be used for deriving the pointwise convergence 
properties of ^ and demonstrating its asymptotic normality. Conditions (i)- 
(v) are similar to those in [15] and conditions (vii)-(viii) are similar to 
conditions C and D of [1]. In order to derive the uniform consistency result, 
conditions (ix)-(x) are also necessary. From the proofs of the theorems, 
continuity of Zj(t) in assumption (x) can be weakened to Zj(t) being left 
continuous with right-hand limits and E[e'X'p{P{V)'^7i{t)}7i{t)^^\V = v] and 
E{Zi{t)®^\V = v} being continuous functions of t for A; = 0, 1, 2. 

2.3. Asymptotic properties. We now establish the asymptotic properties 
of the local pseudo-partial likelihood estimator. We summarize the results 
here and provide outlines of the proofs in Section 6. As shown in Section 6, 
the local pseudo-partial likelihood function £ni$,T) is concave in ^ and its 
maximizer exists with probability tending to one. Let H be a {2p -|- 1) x 
{2p + 1) diagonal matrix with the first p diagonal elements being 1 and the 
rest being h, where p is the number of elements in Z. 

Theorem 1. Under conditions (i)-(viii), we have 



functions. If, in addition, conditions (ix)-(x) are satisfied, then we have the 
uniform consistency 



where any compact subset of the support of the random variable V . 

Theorem 2. Assume that conditions (i)-(viii) are satisfied. Then the 



random vector {nh)-^/'^{e^{$,Q{u),T) - \h^V2[{T-^{u)(3l{u)Y .{i]^} con- 



verges in distribution to a {2p+ \ )-variate normal vector with mean zero and 




sup|H(^(u)-^oW)l 



p 



0, 
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covariance matrix U, where 1'^{$,,t) = din{$,T)/d^, is a p-variate column 
vector with all entries and 11 = IIo + D, in which D = blockdiag(r~"'^z^O) Q2Z^2) 
and 



no = E E lim ^/iB„ij(r)B„iKT)^, 

'^-^ '—^ n— >oo J ^ ' 

t/ie definitions o/B„ij(r) and B„ii(T) appearing in the proof of Theorem 2. 

Theorem 3 (Asymptotic normality). Assume that conditions (i)-(viii) 
are satisfied. Then 

where is a {2p + 1) x {2p + 1) matrix with the first p x p elements being 
1 and the rest being 0, and 5] = A~^n(A~^)-^. 

From the expressions for the asymptotic bias and variance matrix S in 
Section 6, it can be shown that they can be consistently estimated by 

(6) A~\T,v)Bn{T,v) and (n/i)"^A~^(r,';;)n„(r,i;)A-^(r,?;), 

where 

n J 



-I 71/ J nq- 

^n{T,v) = ^ E E / Kh{V,j - v){\Jl^{w) - Ej{w,v)) dNijiw), 

i=l j=l"'0 

-, n r J T- ^ ®2 

""^^'""^ ^7^P\pJo ^"^^'^ ~ " Ej{w,v))dMi,{w)j 

with Snjk{w,v) = iEr=ii^h(^*, -t^)l^«iHexp(r(7;)X^(u;))(U^H)®^ for 
k = 0,1,2, V*j = H-iX*^., = Snji{w,v)/Snjo{w,v) and My(t) = 

jQXij{s)ds, in which Aij(s) = Xoj{s) exp0(yij{s))Zij{s) + giVijis))} 
and Aoj(s) is given in the following section. 

2.4. Estimation of the baseline hazard function. With estimators of (3{-) 
and g{-), we can estimate the baseline hazard function by using a kernel 
smoothing, 



\oj{t)= / Wb{t-x)dAoj{x), 
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where Wb is a given kernel function and 6 is a given bandwidth. The cumu- 
lative hazard function Aoj(-) can be estimated by 



uniformly on (0, r] in probability. 

To investigate the asymptotic properties of the estimated cumulative haz- 
ard function, we assume, for simplicity, that g{V) = 0. The function g{-) 
needs to be estimated by integrating its derivative estimator from the par- 
tial likelihood; hence, its asymptotic properties are challenging to obtain. 
When g(-) = our task is somewhat simplified. The generalized Breslow 
estimator for Aoj(i) is given by 



in the metric space = C[0,r] with the norm p{f,g) = maxi<j<j 

SUPo<t<r -*■(^)|• 
THEOREM 5. Assume that conditions (i)-(x) are satisfied and let nh^ — > 
0. Then the random process vector W(t) = (Wi{t), . . . ,Wj{t)) converges 
weakly to a zero-mean Gaussian random field G{t). 

Remark 1. The covariance structure of the Gaussian field G{t) is very 
complex. It is very difficult to directly calculate this covariance by asymp- 
totic methods. The wild bootstrap provides a useful method for computing 
the covariance or approximating the distribution of G{t) (see [25]). 

Remark 2. Theorem 5 shows that the estimator Aoj(t) is root-n consis- 
tent if the nonparametric estimators are undersmoothed. This means that in 
practical applications, one uses the right amount of smoothing for estimat- 
ing coefficient functions and then chooses a smaller amount of smoothing 
for estimating the cumulative hazard functions. The situation here is very 
different from the one-step likelihood estimation of Carroll et al. [8], but 
similar to their one-step procedure. 




Theorem 4. Under conditions (i)-(x), we have 
Aoj(i) — > Aoj(t) and Xoj{t) — > . 



Aoj(t) 



Write Wj{t) = n 
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2.5. One-step local pseudo-partial likelihood estimator. To estimate the 
functions and g{-) over an interval of interest, we usually need to max- 
imize the local pseudo-partial likelihood (5) at hundreds of points. This can 
be very computationally intensive. In addition, for certain given v, the local 
pseudo-partial likelihood estimator might not exist, due to a limited amount 
of data around v. These drawbacks make computing the local pseudo-partial 
likelihood estimator over an interval less appealing. We consider the follow- 
ing one-step estimator as a feasible alternative. 

To facilitate notation, we drop the dependence of ini^,,T) on r. The local 
pseudo-partial likelihood estimator ^ satisfies £'n{^) = 0. For a given initial 
estimator j^q, by Taylor expansion, we have 

C(to)+C(to)(t-to)«0. 

Thus, the one-step estimator is defined as 

(7) Ls = ko-{Ciko)}-'Ciko)- 

In the Newton-Raphson algorithm, the above equation is iterated until con- 
vergence is achieved. As shown in Section 6, the function in{^) is concave. 
Hence, its maximizer exists and is unique when ^„(^) is strictly concave. 
In practice, we do not have to iterate (7) until convergence is achieved — 
once, or a few times, will suffice. Robinson [24] gives results on the distance 
between the estimators based on a few iteration steps and the maximum 
likelihood estimator. A natural question arises as to how good the initial 
estimator has to be in order for the one-step estimator to have the same 
performance as the maximum local pseudo-partial likelihood estimator. It is 
not hard to show that a sufficient condition is 

(8) mo-^o) = Op{h' + {nh)''/'y, 

see Fan and Chen [13] for a derivation in the local likelihood context. When 
condition (8) is not satisfied, a multiple-step estimator is needed. By repeat- 
edly applying the one-step result k times, as in [24], condition (8) can be 
relaxed to H(to - ^o) = Op{{h^ + V^}. 

Cai, Fan and Li [6] provide a useful strategy on the choice of initial estima- 
tors in the context of generalized linear models and their idea can be adapted 
to the current setting. The idea is to exploit the smoothness of nonparamet- 
ric functions. The strategy is as follows. Compute the local pseudo-partial 
likelihood estimates at a few fixed points. Use these estimates as the ini- 
tial values of their nearest grid points and obtain the one-step estimates at 
these grid points. Use the newly computed one-step estimates as the initial 
values of their nearest grid points to compute the one-step estimates and 
propagate until the one-step estimates at all grid points are computed. For 
example, in our simulation studies, we evaluate the functions at n„^id = 200 
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grid points and are willing to compute the maximum local pseudo-partial 
likelihood at five distinct points. A sensible placement of these points is 
■"^20) 'W^eoi ""^100) ""^140 and wiso- We shall use, for instance, f3{wQo) as an initial 
value for calculating the one-step estimates for P{w5q) and (3{wqi) and then 
proceed to use the resulting estimates as the initial values for calculating 
the one-step estimates for (3{w5s) and 0{wq2)i respectively. We continue this 
process until all the one-step estimates at Wi for i = 40, . . . , 79 are calculated. 

3. Weighted average estimator. An alternative approach is to fit a varying- 
coefRcient model for each failure type, that is, for event type j, fitting the 
model 

hjit^Tij) = XQj{t)e-x.v{l3j{Vij{t))'^Zij{t)+gj{Vij{t))], for i = 1, . . . ,n, 

resulting in tj(^) for estimating $,j{v) = {(3j{v),{f3'j{v))'^ ,gj{v)). Under 
model (1), we have = ^2 = ' ' ' = = ^- Thus, we can estimate ${v) by 
a linear combination ci^i{v) + ■ ■ ■ + c j$,j{v) with J2j=i Cj = 1- Weights Cj 
can be chosen to optimize the performance. Note that the weights Cj can 
be generalized to a matrix Cj to allow for different linear combinations for 
different components of ${v), that is, the linear combination can be gener- 
alized to Ci^i + • • • + C j^j with Ci + • • • -I- Cj = diag(l, . . . , 1) being the 
identity matrix. 

In order to establish the asymptotic distribution of the weighted aver- 
age estimator, we need to derive the asymptotic distribution of ^{v) = 
• • • 'tj)"^- We define ^{v) and ^"{v) similarly to ^'(f), except that the 
$j are replaced by and respectively, for j = 1, 2, . . . , J. Using argu- 
ments similar to those used for Theorems 2 and 3, it can be shown that the 
following theorem holds. 

Theorem 6. Under the conditions of Theorem 2, we have that 

is asymptotically normal with mean and covariance matrix S* = {Gkii^k^^l)) 
for k,l = 1, . . . , J , where R = diag(Ri, . . . , Rj) and Hj is a (2p+ 1) x (2p + 1) 
matrix with the first p x p elements being Ip (an identity matrix) and rest 
equal to 0. Gki{^k,^i) is defined at the end of this section. 

The asymptotic normality of the weighted average estimator follows easily 
from Theorem 6. For example, suppose that we are interested in estimating 
the kth entry of (3. Write Ik as a {2p+ l)-variate vector with the A;th entry 
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being 1 and the rest being and let C = diag(l|', . . . , l^") = Ij (g) 1^. Then 
it follows from Theorem 6 that 

inh)y'f^iPJv)-(3,iv)) - ^^CR^"(^;)| ^Ar(0,S^), 

where = 0^^, . . . , P^j)'^ , /3q = {P^^, . . . , f3kj)'^ , and S^ = C^S;*C, P^j 
and Pi^j being the kth entry of Pj and Pj, respectively. The optimal weight 
which minimizes c-^S^„c with J2j=i'^j = 1 is c^, = (e'^S~^e)~^S~"'^e. Since 
the failure times for different types of failures are usually mutually depen- 
dent, $j (j = 1, . . . , J) are likely to be dependent; hence, the variance is 
not necessarily diagonal. This implies that the optimal weight is unlikely to 
be uniform. In other words, the weighted average estimator with the opti- 
mal weight is generally more efficient than the estimator with the "working 
independence" weight. This is supported by the simulation results displayed 
in Table 2. 

We now give the expressions for the asymptotic variance-covariance ma- 
trix from Theorem 6 and its estimate. From Theorem 3, it is easy to show 
that the asymptotic covariance matrix between {nh)^/^'H.{^j^{v) — ^fco(^)) 
and {nhy/'^'H.{li{v) - $io{v)) is given by 

GfcK^fc,^i) = A,-i(^fc)Jim i?{ni,(^,)ni,(^,)}Ari(^,), 

where = /J" Kh{Vjk - v) [^7*^ - Ski{w, C, v)/sko{w, C, v)] dMjk{w), C = 

H(^ — ^q) and Ci ^)) d = 0,l, are defined as in Section 6. Prom the 

definition of it is natural to estimate liuin^oo 

by 

n 

(9) Dki{h,h) = n-^j2^jkiL)wj;ih), 

where 

JnjO[Xjk,V)) 

m=l Yli=l Yik{Xmk) ^WiPk i^ikiXmk))Zik{Xmk) + 5fc(Knfc(^mfc))} 

X {u;,{x^k) - f''[l'^''"\ ]K,{V,k - V). 
Obviously, Aj(^) can be estimated by 

Ai(^) = - 2^ / ^hiVij - v) 5 — dNij{w). 

"^^0 {SnAw,v)f 
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Write 

By means of some tedious proofs, we can show that Gki{$,k,$,i) is a consistent 
estimator of Gfc/(^fe'^/)- Hence, the covariance matrix of {^i, . . . can be 
consistently estimated by S = {nh)~^{Gij{^^,^j)):- j^^. These results pro- 
vide a basis for simultaneous inferences about the ^j, j = 1, 2, . . . , J, as well 

as for the weighted average estimator J2j=i Cjkj for ^. 
4. Numerical examples. 

4.1. Simulations. We perform a series of simulation studies to evaluate 
the performance of the proposed estimation method. Multivariate failure 
times are generated from a multivariate extension of the model of Clayton 
and Cuzick [10] in which the joint survival function of [Ti, . . . ,Tj) given 
(Zi,...,Zj) and iVi,...,Vj) is 

(11) F{h, ...,tj;Zi,...,Zj,Vi,...,Vj) = [Yl SjihT' - (J - 1) 

where J takes integer values and Sj{t) is the marginal survival probability for 
the j'th member, depending on covariates Zj and Vj. Note that is a parame- 
ter which represents the degree of dependence of Tj and Tj, i,j = l,2,...,J. 
The relationship between Kendall's r and 9 is r = 0/{2 + 6). Specifically, 
9 = 0.25 and = 4 represent weak and strong positive dependence, respec- 
tively, with 9^0 giving independence and — > oo giving maximal positive 
dependence. In our simulation, 9 was chosen to be 0.25, 1.5 and 4.0, which 
correspond to low, moderate and high positive dependence, respectively. The 
Gaussian kernel function is used for the estimates. 

In our first set of simulations, we examine the performance of the local 
pseudo-partial likelihood estimators. We consider the marginal distribution 
of Tij to be exponential with failure rate 

(12) X,j{t) = Xoj{t) eMP{Vij)Zij + g{Vij)}. 

We choose the baseline hazard function to depend on time, specifically, 
Xojit) =4t^X*., where j = 1,2,3. We take X*Qj to be 0.2, 1.0 and 1.5 for 
J = 1, 2 and 3, respectively. Failure times {tii,ti2, Us), z = 1, 2, . . . , n, are gen- 
erated from the distribution function (11) with marginal distribution (12). 
We generate the covariate vector Zjj = {Ziji, Zij2, ■ ■ ■ , Zijp)'^ from a mul- 
tivariate normal distribution with marginal mean 0, standard deviation 5 
and correlation between Ziji and Zijk equal to p''~^, where p= l/Vb. We 
consider p = 2 and the varying coefficients 

pi{V) = 0.5V {1.5 -V), /32(V^) =sin(2y) 



MULTIVARIATE HAZARDS REGRESSION 



15 



(a) Average Estimates of [)^(.) 



(b) Average Estimates of 



(c) Average Estimates of g{.) 




1 




Fig. 1. Simulation results for the local pseudo-partial likelihood estimator with 200 clus- 
ters and the dependence parameter of failure time 6 = 0.25. (a), (b) and (c) provide the 
average estimates of l^^i-) and g{ ) for heavily censored data with c = 2, respec- 

tively. Solid curves are true functions. Three bandwidths are used: dash-dotted curves for 
bandwidth h = 0.1, dashed curves for h = 0.2 and dotted curves for h = 0.4. 



and 

5(y)=0.5{e^-i-5-e-i-5}, 

where V is generated from a uniform distribution over [0,3]. In our simu- 
lation, using a similar derivation as in [5], with given covariates (Ziij,Vij), 
j = 1,2,3, the failure times {tii,ti2,ti-s) are generated from independent uni- 
form random variables {wii,Wi2,Wi3) as follows: 

Ui = [-log(l - i(;a)T(V-i, Zii, ASl)]^/^ 

ti2 = [^log{l - aa+aa{l - Wi2f~"-''^~'}T{V^2,Zi2,Xo2)f\ 
ti3 = [^log{l - (ail +ai2) + {aii+ai2 - 1)(1 - iWig)^^"'^^^"'} 

xT(y,3,Z.3,AS3)]'/'. 

Here an = (1 - wuy^ for I = 1,2 and i = l,2,...,n and T{V,Z,X*) = 
exp{/3(y)Z + g(y)}/X* . Censoring times Cij are generated from the uni- 
form distribution over (0,c), where c is a constant which is set to control 
the censoring rate. There is approximately 10% censoring when c = 5 and 
approximately 30% censoring when c = 2. For each of the configurations 
studied, 500 simulations were carried out. 

Table 1 summarizes the simulation results for the local pseudo-partial 
likelihood estimator of /3(-) and g'{-) with the number of clusters being 200, 
6 = 0.25 and c = 2. We present the estimates of the functions evaluated at 
V = 0.5, 1.0, 1.5, 2.0 and 2.5. The bandwidths we considered were h = 0.075, 
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Table 1 

Summary of simulation results based on local pseudo-partial likelihood procedures 



/32(-) g'{-) 
V h bias SE SD bias SE SD bias SE SD 



0.5 


0.075 


-0, 


.071 


0, 


,194 


0, 


,262 


-0.179 


0, 


,266 


0, 


,370 


-0, 


,059 


1, 


,931 


1, 


.501 




0.100 


-0, 


,036 


0, 


,158 


0, 


,177 


-0.095 


0, 


,215 


0, 


,266 


-0, 


,011 


1, 


,118 


0, 


.956 




0.150 


-0, 


,007 


0, 


,121 


0, 


,133 


-0.004 


0, 


,160 


0, 


,175 


0, 


,003 


0, 


,538 


0, 


,493 




0.200 


0, 


,028 


0, 


,096 


0, 


,101 


0.077 


0, 


,121 


0, 


,125 


0, 


,044 


0, 


,328 


0, 


.295 




0.400 


0, 


,077 


0, 


,076 


0, 


,086 


0.252 


0, 


,087 


0, 


,108 


0, 


,247 


0, 


,167 


0, 


.123 


1.0 


0.075 


-0, 


,041 


0, 


,187 


0, 


,252 


-0.173 


0, 


,279 


0, 


,354 


0, 


,085 


1, 


,817 


1, 


.495 




0.100 


-0, 


,009 


0, 


,153 


0, 


,188 


-0.094 


0, 


,218 


0, 


,259 


0, 


,079 


1, 


,140 


0, 


.933 




0.150 


0, 


,004 


0, 


,115 


0, 


,118 


0.006 


0, 


,156 


0, 


,164 


-0, 


,007 


0, 


,456 


0, 


.451 




0.200 


0, 


,027 


0, 


,092 


0, 


,093 


0.101 


0, 


,113 


0, 


,124 


-0, 


,066 


0, 


,305 


0, 


.257 




0.400 


0, 


,108 


0, 


,063 


0, 


,064 


0.382 


0, 


,066 


0, 


,082 


-0, 


,055 


0, 


,132 


0, 


.093 


1.5 


0.075 


-0, 


,004 


0, 


,174 


0, 


,231 


-0.031 


0, 


,181 


0, 


,236 


0, 


,066 


1, 


,843 


1, 


.527 




0.100 


0, 


,016 


0, 


,142 


0, 


,164 


-0.015 


0, 


,145 


0, 


,157 


0, 


,032 


1, 


,040 


0, 


.949 




0.150 


0, 


,019 


0, 


,110 


0, 


,114 


-0.007 


0, 


,116 


0, 


,115 


0, 


,035 


0, 


,533 


0, 


.496 




0.200 


0, 


,023 


0, 


,092 


0, 


,094 


0.010 


0, 


,096 


0, 


,097 


0, 


,035 


0, 


,329 


0, 


.304 




0.400 


0, 


,075 


0, 


,061 


0, 


,060 


0.063 


0, 


,065 


0, 


,068 


-0, 


,181 


0, 


,131 


0, 


.095 


2.0 


0.075 


0, 


,097 


0, 


,196 


0, 


,277 


0.145 


0, 


,235 


0, 


,326 


0, 


,140 


1, 


,886 


1, 


.503 




0.100 


0, 


,076 


0, 


,167 


0, 


,195 


0.070 


0, 


,192 


0, 


,224 


0, 


,104 


1, 


,075 


0, 


.956 




0.150 


0, 


,047 


0, 


,129 


0, 


,142 


0.004 


0, 


,143 


0, 


,139 


0, 


,078 


0, 


,566 


0, 


.521 




0.200 


0, 


,037 


0, 


,107 


0, 


,112 


-0.044 


0, 


,114 


0, 


,117 


0, 


,091 


0, 


,355 


0, 


.339 




0.400 


0, 


,049 


0, 


,072 


0, 


,076 


-0.254 


0, 


,068 


0, 


,076 


-0, 


,098 


0, 


,138 


0, 


.116 


2.5 


0.075 


0, 


,206 


0, 


,315 


0, 


,413 


0.141 


0, 


,261 


0, 


,329 


0, 


,470 


1, 


,982 


1, 


.635 




0.100 


0, 


,135 


0, 


,260 


0, 


,305 


0.057 


0, 


,206 


0, 


,251 


0, 


,210 


1, 


,160 


1, 


.058 




0.150 


0, 


,074 


0, 


,200 


0, 


,216 


-0.004 


0, 


,151 


0, 


,166 


0, 


,095 


0, 


,633 


0, 


.570 




0.200 


0, 


,039 


0, 


,166 


0, 


,179 


-0.074 


0, 


,119 


0, 


,128 


0, 


,005 


0, 


,385 


0, 


.384 




0.400 


-0, 


,014 


0, 


,125 


0, 


,130 


-0.222 


0, 


,087 


0, 


,098 


-0, 


,261 


0, 


,189 


0, 


.181 



0.1, 0.15, 0.2 and 0.4. The averages of the 500 estimates for P2{v) 
and g'{v) subtracting their true values are given in the "bias" columns and 
the standard deviations of the 500 estimates are given in the corresponding 
SD columns. The SE columns give the averages of the estimated standard 
errors. Figure 1 provides the average estimates for /32(-) and g{-) based 

on different bandwidths. It gives us an idea of how large the biases are for 
different bandwidths. From Table 1, we can also see that as the bandwidth 
increases, the variance decreases. As expected, with a large bandwidth h = 
0.4, the bias is large and the variance is small. Note that the absolute biases 
exhibit a U-shape in Table 1. This is unusual, but can happen. The bias 
depends on function values in a local neighborhood and is continuous in h. 
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If the bias associated with h = 0.075 is negative and the bias associated with 
h = 0.4 is positive, as in this example, then the bias necessarily crosses zero 
and the U-shape absolute biases emerge. Table 1 also shows that when the 
bandwidth is 0.15 and above, the average SE is close to SD, which indicates 
good performance of the variance estimator. We have also examined the 
situations involving moderate and high dependence of the failure times, using 
9 = 1.5 and 4, respectively, as well as lighter censoring (with c = 5). The 
conclusions are similar. 

We also examined the performance of the weighted average estimator. 
The results are presented in Table 2. The rows labeled "W" are those 
based on the weighted average estimates for f3p with optimal weight Cp = 

{e^Ylp e)^^'Sip e, where Sp is the estimator of the asymptotic variance- 
covariance matrix of {/3pi,$p2, Pps)'^ ■ The maximum local pseudo-partial like- 
lihood estimates are indicated by "P" . The performance of an estimator /3(-) 
is assessed via the square root of average square errors (RASE), 

/ "grid \ 1/2 

(13) RASE = [/5(^fc) - P{wk)f , 

where {wk, k = 1, . . . ,ngrid} are the grid points at which the functions /3(-) 
are estimated. In our simulations, we take f^grid = 

200. 

For the local pseudo-partial likelihood estimates, we used bandwidth h = 
0.15. For the weighted average estimates, a bandwidth oih = 0.225 was used 
since the amount of data used for estimating the covariate effects for each 
event type was significantly less. The censoring parameter c was set to be 5. 
The weighted average estimate cannot always be calculated since the data 
for each event type could be too sparse to permit an estimate for each type. 
We only report those estimates which exist. From Table 2, we can see that 
the weighted average estimator has smaller RASE than that for the local 



Table 2 

Comparison of the local pseudo-partial likelihood estimator (P) and the weighted average 

estimator (W) 





EST 






RASE 




/32(-) 




Abias 


SD 


SE 


Abias 


SD 


SE 


RASE 


61 = 0.25 


P 


0.0693 


0.1063 


0.1045 


0.1269 


0.0198 


0.1068 


0.1036 


0.1086 




W 


0.0767 


0.0928 


0.1222 


0.1204 


0.0494 


0.0898 


0.1047 


0.1025 


e = 4.00 


P 


0.0653 


0.1065 


0.1029 


0.0156 


0.0215 


0.1041 


0.1013 


0.1063 




w 


0.0750 


0.0915 


0.1437 


0.0140 


0.0477 


0.0879 


0.1037 


0.1000 



Note: Abias is the average absolute bias of the estimator /3j for j = 1, 2 and RASE denotes 
tlie square root of average square errors of the estimator jSj . 
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Table 3 

Comparison of the average square errors of the local pseudo-partial likelihood estimator 
(P) with those of the one-step estimator (OS) 







e 


= 0.25, c = 


2 


e 


= 4.0, c = 


2 


h 


Estimator 


mean 


median 


std 


mean 


median 


std 


0.1 


P 


0.0879 


0.0738 


0.0602 


0.0710 


0.0569 


0.0550 




OS 


0.0879 


0.0740 


0.0603 


0.0716 


0.0573 


0.0551 


0.2 


P 


0.0276 


0.0199 


0.0239 


0.0287 


0.0220 


0.0273 




OS 


0.0276 


0.0198 


0.0239 


0.0287 


0.0220 


0.0272 


0.4 


p 


0.0107 


0.0084 


0.0020 


0.0216 


0.0140 


0.0214 




OS 


0.0107 


0.0084 


0.0020 


0.0216 


0.0140 


0.0214 






6 


= 0.25, c = 


5 


e 


= 4.0, c = 


5 


0.1 


p 


0.0350 


0.0279 


0.0256 


0.0584 


0.0499 


0.0506 




OS 


0.0350 


0.0278 


0.0255 


0.0586 


0.0500 


0.0506 


0.2 


p 


0.0200 


0.0137 


0.0182 


0.0205 


0.0148 


0.0199 




OS 


0.0200 


0.0137 


0.0182 


0.0204 


0.0148 


0.0199 


0.4 


p 


0.0162 


0.0113 


0.0148 


0.0162 


0.0117 


0.0146 




OS 


0.0162 


0.0113 


0.0148 


0.0162 


0.0117 


0.0146 


Note: 


"mean," "median 


" and "std" denote 


the average. 


median and sample 


standard 



derivation of the average square errors, respectively, based on 300 simulations. 

pseudo-partial likelihood estimator in most of the cases when the weighted 
average estimator can be calculated. 

In the second set of simulations, we compare the performance of the one- 
step estimator (OS) to that of the maximum local pseudo-partial likelihood 
estimator (P). We use model (11) with somewhat different configurations. In 
particular, V is now generated from the standard uniform distribution over 
[0, 1], Z is independently generated from a standard normal distribution, 
= 0.25 and 4 and (Aqi, A02, Aqs) = (0.2,1.0,1.5). Censoring times Cij are 
generated from the uniform distribution over (0,c) with c = 2 and 5. We 
take g{u) = 8u{l — u) and I3{u) = exp(2« — 1). 

Table 3 presents the summary of the average square errors (ASE = RASE^) 
for the one-step estimator (OS) and the maximum local pseudo-partial like- 
lihood estimator (P) under various realizations. From the table, we can see 
that the performance of the one-step estimator is very close to that of the 
maximum local pseudo-partial likelihood estimator. Figure 2, which presents 
the box plots for the two methods, also confirms this. We have also conducted 
simulations using the parameters considered in the first set of simulations. 
The results are similar. 

4.2. Application to Busselton population health surveys. We illustrate 
the proposed method by analyzing a data set from the Busselton Popula- 
tion Health Surveys. The Busselton Population Health Surveys are a series 
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of cross-sectional health surveys conducted in the town of Busselton in West- 
ern Australia. Every three years from 1966 to 1981, general health informa- 
tion for each adult participant was collected by means of a questionnaire 
and a clinical visit. Details of the study are described in [12, 19]. Data for 
several cardiovascular risk factors are available for 2202 persons who make 
up 619 families. In this analysis, we investigate the effect of cardiovascular 
risk factors on the risk of death due to cardiovascular disease (CVD) based 
on these family data. Since the death times of the family members might 
be correlated due to genetic factors and cohabitation, we are dealing with 
multivariate failure time data. 

The risk factors we considered included age (in years), body mass in- 
dex (BMI, in kg/m^), serum cholesterol level (chol) and smoking status. 
Serum cholesterol (in mmol/L) was determined from a blood sample. Partic- 
ipants' smoking statuses were classified into three categories: never smoker, 
ex-smoker and current smoker. Two indicator variables were created to in- 
dicate the three levels of smoking status: smokel is coded as 1 for ex-smoker 
and otherwise; smoke2 is coded as 1 for current smoker and otherwise. 

If a person took part in more than one of the Busselton surveys, only the 
record from the survey at which that person's age was closest to forty-five 
years is included. Forty-eight percent of the participants were males (gender 
= for male and 1 for female). The average age in the data analyzed was 
41.7 years, ranging from 16.3 to 89.0 years old. The average cholesterol 
reading was 5.65 mmol/L. The average body mass index was 24.8 kg/m^. 
The prevalences of the never-smokers, ex-smokers and current smokers were 
49%, 17% and 34%, respectively. Of the 619 families, there were 154 families 
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(b) Box-plots of WMSE for moderately censored data 
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Fig. 2. Simulation results for the comparison of the maximum local pseudo-partial likeli- 
hood estimator (P) with the one-step estimator (OS), (a) The box plots are for the distri- 
bution of the ASE over the 300 replications, using three bandwidths /i = 0.1, 0.2, 0.4 (from 
left to right). Column numbers 1, 3 and 5 plot the maximum local pseudo-partial likelihood 
estimator and column numbers 2, 4 and 6 plot the one-step estimator (OS) for heavily 
censored data with c — 2; (b) the same as (a) for moderately censored data, c—5. 



20 



J. CAI, J. FAN, H. ZHOU AND Y. ZHOU 



with one event, 28 families with two events and 3 famihes with more than 
two events. There were 219 observed events in all. 

For this analysis, we are interested in investigating how the effect of the 
risk factors changes with age. We consider the model 

Xij{t\J='ij) = Aoj(t) exp(^i(agejj) * gender^^- + /32(agejj) * hmUj 

+ /?3(agejj) * choljj + /?4(agejj) * smokeljj 
+ Pbiageij) * smoke2ij + g{ageij)), 

where j = 1 and 2 denote the parents and the children of the family, respec- 
tively, and smokel and smoke2 are the indicators for ex-smoker and current 
smoker, respectively. We take the bandwidth to be /i = 0.15 * (max(age) — 
min(age)) = 10.905. 

Figure 3 presents the estimates for the varying coefficients as functions of 
age. From Figure 3(a), we can see that men have a higher risk of dying from 
CVD than women with the hazard ratio being 1.96 with 95% confidence 
interval (CI) of (1.30,3.03) at age fifty. The effect does not seem to change 
much with age for those older than thirty-five. From Figure 3(b), BMI has 
little effect on the risk because the coefficient is close to zero over the span of 
age. From Figure 3(c), higher cholesterol level is associated with higher risk 
of dying from CVD and the effect of cholesterol increases with age. The haz- 
ard ratio for 1 mmol/L change in cholesterol is 1.01 (95% CI: [0.80, 1.28]) at 
age forty and 1.30 (95% CI: [1.12,1.53]) at age sixty-five. From Figures 3(d) 
and (e), ex-smokers have risk similar to that of the never smokers, while cur- 
rent smokers have a higher risk of dying from CVD compared to the never 
smokers. The effect of current smoking is higher for younger people with the 
hazard ratio being 5.60 (95% CI: [2.19,14.34]) at age forty and 1.07 (95% 
CI: [0.72,1.60]) at age sixty-five. 

5. Concluding remarks. The local pseudo-partial likelihood is a powerful 
and a straightforward tool for analyzing multivariate failure time data. The 
estimator asymptotically follows a normal distribution. Simulation results 
show that the asymptotic approximation is applicable to finite samples with 
moderate numbers of clusters. 

The weighted average estimator, when it can be calculated, can be a 
more efficient alternative to the maximum local pseudo-partial likelihood 
estimator. A disadvantage of the weighted average estimator is that it cannot 
always be calculated since it involves estimating the covariate coefficient for 
each failure type and the data for each failure type could be too sparse to 
permit a reliable estimate. 

Use of the one-step estimator is an effective means to reduce the computa- 
tional burden of an estimator involving iterations. We showed theoretically 
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{b) Estimator of ^^(.) for Bt^l 



(c) Estimator of B,<.) for CHOL 
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(d) Estimator of for Smol(e1 




(e) Estimator of ().(.) for SmolteS 
10' ^ 
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(f) Estimator of derivative g'(.) 
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(g) Estimator of intercept 9(.) 




Fig 

ard rate model is A 



3. Data analysis for Busselton Population Health Surveys study 
{t)^Xo,it)eMEl 



The marginal haz- 
j^l3k{Vij{t))Zij{t) + giVijit))), where V"(t) = age 
and (t) = (Gender, BMI, CHOL, Smokel, Smoke2), corresponding to the plots (a)-(e), 
respectively, (i) is the plot of g[-), and (g) is the plot of g{-), where smokel is coded as 1 
for ex-smoker and otherwise, smoke2 is coded as 1 for current smoker and otherwise. 
The dotted curve is the confidence curve on nominal level a = 0.05. In this setting, the 
chosen bandwidth is h = 0.15(max(age) — mm(age)) — 10.905. The x-axis is for age. 



and empirically that the one-step estimator is an excellent approximation of 
the fully iterated maximum local pseudo-partial likelihood estimator. 

Our proposed methods are sensitive to the choice of bandwidth in con- 
structing a local smooth estimation. The one-step estimator and maximum 
local pseudo-partial likelihood estimator have the same asymptotic distribu- 
tion and share the same asymptotic bandwidth. We can use the sophisticated 
bandwidth selection rule proposed in [14] for these estimators. 

The methods proposed in this paper can easily be extended to the more 
general form of multivariate failure time data. More specifically, suppose 
that there are n clusters that in each cluster there are K correlated individ- 
uals and for each individual there are J possible distinct failure types. The 
marginal hazard function for the jth type of failure of the kth individual in 
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the ith cluster is related to the corresponding covariate vector Zjjfc(t) by 
Xijk{t,Zijk) = Xoj{t) expiP'^ {Wijk)Zijk{t) + g{Wijk)} , 

where Aqj (t), j = 1,2, . . . , J , are unspecified positive functions and (3{-) and 
g{-) are defined as in (1). The maximum local pseudo-partial likelihood for 
the more general model can be derived similarly and the asymptotic prop- 
erties can be established with a similar, but more tedious, approach. 

6. Proofs. Let (Q, T , P(/3^g^x)) be a family of complete probability spaces 
provided with a history = {J-^t} for an increasing right-continuous filtration 
J-f C J-'. Let Yij{t) = I{Xij > t). We assume that Vij is -measurable and 
that Nij{w) and Zij^w) are .7^-adapted. Let J^t,ij = cr{Nij{'w~),Zij{w), 
Vij,Yij{w), 0<w<t}, i=l,2,...,n, j = l,2,...,J, and Mij{t) = Nij{t) - 
jQYij{w)Xij{w) dw, i = 1,2, . . . ,n. Obviously, Mij{t) is a 'J2=iJ^t,ij martin- 
gale. 

To facilitate technical arguments, we will reparameterize the local pseudo- 
partial likelihood (5) via the transform C, = H(^ — ^q). Hence, the loga- 
rithm of the local pseudo-partial likelihood function has the form tn{C^t) = 
^„(H~^^ + ^Q, t). By simplification, we have 




X [Cui^iw) + il^Uw) - \ogSnjo{w,C,v)] dN.jiw), 
where lJ*j{w) =H~^X.*j{w) and 
1 " 

Snjkiw, C, v) = -Y^Kh{Vij-v)Y,j{w) exp{C^U*,(u;) +^JX^(«;)}(U*^.(«;))« 

1=1 

Furthermore, for each w £ [0,r] and k = 0, 1,2, we rewrite £niC) = ^{C^t) 
and define 

S:^,iw, 0,v) = ^J2 KH{V.,-v)Y,iw) eMP'^mZ^,iw) + g{V,M^:^Hf 

Ti . 
1=1 

where ^(•) = /J'l-f, slOf, 0{.) = {0^ {■) , g{.))^ , v G $y. 

Recall the notation p{w,z,v) introduced in Section 2.2. Define, for v £ 

Sjo{w,e,v) = fj(v)E[p{w,Zj(w),v)\Vj =v\, 

s*i {w,e,v) = fj {v)E[p{w, Zj {w) , v){Z] iw) , 0, 0)^ | = v] , 

s*2{w,e,v) = fj{v)E[p{w,Zj{w),v)E{Zj,w)\Vj = v]. 
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where 

/Zj{w)Zj{w) 

E{Zj,w)= Zj{w)Zj{w)fi2 Zj{w)fi2\. 

\ ^J{w)fl2 /i2 

To facilitate notation, the true functions Oq{u) = {(3q (u), gQ{u))^ , ^o{u), 
Co = and v shall be omitted in S*jj^{t,6,v), SnjkitjC^'v) and s*f,{t,6,v), 
Sjk{t,Cv) whenever there is no ambiguity, for example, 

Snjkii) = Snjkii^v) = S^j^{t,0o,v), s*fc(t) =s*k{t,v) = s*k{t,eo,v), 

Snjkit) = Snjkit,v) = Snjkit,0,v), Sjfc(t) = Sjk{t,v) = Sjkit,0,v), 

Snjk{t,C) = Snjk{t,C,v), Sjk{tX) = Sjk{tX,v)- 

We will need the following two lemmas in the proofs of the theorems. Let 

n 

Cnjit) = y^JitMy^3, {Vij " v)/h, Zij{t))Kh{V,, - v) 



i=l 



for a function 



Lemma 1. Assume that conditions (i) and (iv) hold. Assume that tp{-, •, •) 
is continuous for its three arguments and that E{'ilj{Vj,w,Zj{t))\Vj = v) is 
continuous at the point v for each j and w. If h ^ in such a way that 
nh/ logn^oo, then 

J 

sup J2\Cnj{t)-Cj{t)\^0, 

where Cj{t) = fj{u) J E{Y{t)Tp{v,w,Zj{t))\Vj = v)K{w)dw, fj{u) being the 
density function ofV. Under conditions (viii)-(x), we have 

J 

\Cnj{t,v) - Cj{t,v)\ 0, 

where B is a compact set satisfying infueB fj{u) > 0. 

Proof. By the assumption on h, it is easy to show that for every t G 
[0,r], 

(14) \c^.^t)-Cj{t)\^0. 

Now, we divide [0,t] into M subintervals with a given length not 

exceeding 5. Note that 6 does not depend on n. Then 

(15) ^max^ \CnJ{t^) - Cj{t,)\ 0. 
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Thus, we need only deal with the term 

(16) max sup \Cnjit) - Cjit) - {CnjiU-i) - C,{ti_i)}\. 

l<i<M|i_i^_^|<5 

By decomposing into a positive part and negative part, we can decompose 
Cnj{t) into C^j{t) and C~j{t). Hence, we need only show that as n ^ oo and 
6^0, 



(17) 



max sup |C7+.(t) -C+(ti_i)| 

+ max sup \CUt)-CUti^i) 

l<i<M ' . . J ^ 







\t-ti^i\<S 



and a similar result for C~j{t) and C~(t). 

We now focus on the first term of (17), which is bounded by Ji + J2, 
where 



Ji = niax sup 



l<i<M 



|t-tj_i|<<5 



1=1 

x{^+iVij,{Vij-v)/h,Zij{t)) 



■i^+iVi,,iVi,-v)/h,Zi^{ti_i))} 



and 



Jo = max sup 



n 



X ^^{Vij, {Vij - v)/h,Zij{ti.i))Kh{Vij - v) 



Note that Zij{t), / = 1,2, . . . ,n, is continuous on [0,t]. Thus, Ji is bounded 
by 

max max sup \tp^{Vij,{Vij — v)/h,Zij(t)) 

l<l<n l<i<M \t-ti^j^\<S 



-^P+iVl„{Vl, - v)/h,Zi^{ti.i))\n-^Y.^h{Vij - v), 



1=1 



which tends to zero in probability. Since Yij{t) is a decreasing function of t, 
we have, for any e > 0, that the probability P {J2 > e) is bounded by 



MP 



Y,I{ti-i<^ij <ti 



X iJ+{Vij,{Vij -v)/h,Zij{U.i))KhiVi, - v) 
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It is easy to show that 



n 



1=1 
p 



E{I{ti.i < Xij <ti)il;+{v,w,Zij{ti.i))\Vij = v}K{w)dw. 

Note that by the Cauchy-Schwarz inequahty, 
E{I{ti^i < Xj <ti)tl;+{v,w,Zj{ti_i))\Vj =v) 

= \P{ti-i\Vj = v)- P{ti\Vj = v)\^/^E^/\i;+\v,w, Zj{ti.i))\V = v) 

since |tj — < 5, where M* is some constant. Hence, J2 ^ as first ?i — > 00 
and then 5 — > 0. 

The second term of (17) is bounded by 

maxi<i<Af sup|(„(^_^l<5/, (u) J E{Yj{t){i;+{v,w,Zj{t)) 

-ij+{v,w,Zj{ti^i))\Vj =v}K{w)dw 

(18) r 

J E{I{ti.i < Xj < U) 



+ max fj(v) sup 
l<i<M \t-u_,\<5 



X ■ip~^{v,w, Zj{ti^i))\Vj = v}K{w) dw 



which tends to zero as (5 — > 0. This imphes that (17) holds and hence com- 
pletes the proof of Lemma 1. □ 

Lemma 2. Under conditions (i)-(vi), we have for A; = 0, 1,2, 

S*njk{w) = s*j,{w) + Op{l), 

uniformly for w G (0,r], where s*^,{w) = s*f,{w,6Q,v) and, in addition, under 
conditions (viii)-(x), we have 

sup \\s;;,jk{w,v) - s*kiw,v)\\ =op(i). 

Furthermore, we have 

sup \\Snjk{w,C) - Sjk{w,C)\\ =Op{l). 

Lemma 2 can be easily proven by Lemma 1. 
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Proof of Theorem 1. We first show that C ^ in probabihty, where 
^ = H(j^ — ^q), ^ being the maximum local pseudo-partial likelihood estima- 
tor of . Let 



1 " /■* 
Xnj{tX) = -Y. / KhiVi 

Then it is easy to show that 



C ^ij{w) - log -^-^ — — 



dMij{w). 



(19) 
where 



Lit, c) - L{t, 0) = ^ x„,{t, c) + Yn(t, c), 



.7=1 



t wT/- 1 Snjo{w,C) 



JnjO[W,U) 



Xoj{w) dw. 



By Lemma 2, we obtain that 



Ynitx)=Y: I 



(s*iHfC-lo: 



SjQ{wX) * 



Y{tX)+op{l) 



\Qj{w) dw + op(l) 



By an argument similar to that in [1], it can be shown that each term in 
the sum of the asymptotic representation of Yn{tX) is a strictly concave 
function in Q and that it has the maximum value at C = 0- The first term 
in (19) is a sum of J local square integrable martingales with the square 
variation process being 



1 /■* 

{Xn,,Xn,){t) = -Y. Kl{V,,-v) 



Snjo{w,0) 



X Yi,j{w)exp{PQ{Vij) Zij{w) + go{Vij))Xoj{w) dw. 

It follows from Lemmas 1 and 2 that 

EXl^it, C) = E{Xnj,Xnj){t) = 0{{nhy^) ^0, < t < r. 

This implies that Xnj{t, C) — > in probability for 1 < j < J. Hence, we obtain 
that 

in{tX)-Ut,o) = Y(tx) + op{i). 

We can easily show that ^ maximizes the strictly concave function init, C) ~ 
^„(t,0). By Lemma A.l of [8], it follows that ^ ^ in probability. 
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We now prove the second result of Theorem 1. By the same argument as 
above, we can prove from Lemma 1 that 

sup sup sup \in{t, C) - in{t, 0) - Y{t, C)| ^ 

te[o,T] ioec* vG'S'v 

in probabihty, where C = — ^q) and C* is a convex and compact subset 
of R'^P'^^. Therefore, it follows from Lemma A.l of [8] that sup^g^^ |^| — > 
in probability. Hence, the proof of Theorem 1 is complete. □ 



Proof of Theorems 2 and 3. Note that we have proved in Theorem 1 
that H(^(t;) — ^o('^)) — > in probability. This result is very useful for proving 
Theorem 2. We divide the proofs into the following three steps: 

(a) The asymptotic normality of £'^{0). The logarithm of the local pseudo- 
partial likelihood function can be written as 



4(0) 



i=l j=l 

^ n J 







Snjliw,v) 
Snjoiw,v) 



dMij{w) 



1=1 j=l 



Snjliw,v) 



Snjo{w,v)_ 

X ex.p{(3Q{Vij)'^Zij{w) + go{Vij))Yij{w)\oj{w) dw 
= /i(r,0)+/2(T,0). 

We first deal with I2{t,0). Noting that 

n J 



1 



i = l j = l 



Snjl{w) 



Snjo{w). 

X [exp{Po{VijfZy{w)+goiVy)} 

- expi^I^ij + 9o{v))] 
X KhiVij - v)Yij{w)Xoj{w) dw, 

it follows from a Taylor expansion and Lemma 1 that 



/2(r,0) 



2n 



n J 

EE 



2 Jo 



yi,Hexp(^^ X*^. + <7o(^^)) 

X [(3':,[vf7:,,{w)+g'i{v)]{V^, - vfKh{Vi, - v) 
X \Q,j{w) dw{l + Op{h)) 



sUw) 



p{w,Zj{w),v) 
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x[/3^'(^;fZ,H+ff^;(^;)] 
X Xoj{w)dw{l + Op{h)), 



where s*f^{w) = s*j^{'w,9Q,v) for A: = 0,1, 2. Since K{-) is a symmetric func- 
tion, by simple calculation we have 

(20) hir, 0) = B„(r, v) = ^h^U2[{T-' p'^{v)f , 0^, 0]^(1 + Op{h)). 

We now consider Ii(t, 0). Let B„jj(T) = Jq Kh{Vij — v)[\J*j{w) — 

g^o(«;'c't') -l dMij{w). By conditions (vi)-(x). Lemma 2, Lemma A.l of [25] 
and some tedious and routine calculation, we can prove that 

i ± r K,(V„ - V) [14^ - Jfifm dM,.(„) = Op((„/,)-'/^). 



i=l • 

Hence, it follows that 



in J 

^i(^'0) = -EEB™,(r)+op(l). 



1=1 ]=i 

Note that \/n7i/i(r, 0) is a sum of i.i.d. random vectors X]/=i ^nijC''") with 
zero mean and finite variance. The desired asymptotic normality follows from 
the multivariate central limit theorem by using the Lyapunov condition. It 
can be shown that the asymptotic variance is 
/J \®2 
n = nl^^MEB™,(r) 

(21) ^'=' ' J J 

= E„lil^^^B„i,(r)®2 + ^ ^ Jim ^/.B„i,(r)B„iKr)^- 

Note that Y^=\^nii{fy is a local square-integrable martingale with respect 
to the filtration IJ^^j^ Tt^ij = (j{Nij(w~)^ Tiij{w) ^Vij ^Yijiw) < w <t,i = 1,2, 
. . . ,n}. Hence, it can be shown that the first term of (21) converges to D. 
By the Cauchy-Schwarz inequality, we can easily see that lim„^oo Eh^nij (t) x 
B„i/(r)'^ exists. Write Ilji{T,v) = lim.„_»oo-E'/iB„ij(r)B„ii(T)^. Hence, we 
can prove that the second term of (21) converges to IIo{t, v) = J2i^j n/j(T, v) 
for the limit matrix Hij (r, v) . The proof of Theorem 2 is then completed by 
using the asymptotic results for /i(r, 0) and /2(t, 0). 

(b) Convergence of the Hessian matrix. We shall show that the second 
derivative of the logarithm of the local pseudo-partial likelihood function 
converges to a finite constant matrix. We have shown in Theorem 1 that 
^ — > in probability. Hence, by the mean value theorem, we have 

(22) ~Cic)=iM + op{i). 



MULTIVARIATE HAZARDS REGRESSION 29 
Since Sjj^^w) = Sjk{w) exp{gQ{v)), k = 0, 1,2, from Lemma 2, we can obtain 

C(0) = -X T.T.K.{V.,-v) ^^^^ dN.,H 

+ op(l). 

Write Fu{w) = P{X <w,A = l\Vj = u) and denote its corresponding condi- 
tional empirical distribution Fnj{w) = ^ X]r=i ~ v)I{Xij < w,Aij = 
1). By means of the conventional argument used in kernel smoothing, to- 
gether with empirical process theory, it can be shown that 

(23) '-^'^-^0 '"^^^^"^ 

= -A{T,v)+Op{l), 

where A(r, v) = J^j^-^ Jq ' (l)y2 — dFu{w). It is easy to show 

that A(t, f) is positive definite by condition (vii). 

(c) Asymptotic normality of $,{v). Since C maximizes iniC)^ by Taylor 
expansion around 0, we have 

-4(0) = 4(C)- 4(0) = (C(r)fC, 

where C lies between and C (strictly speaking, the intermediate point 
can depend on the element of 4i but this does not alter the proof). Hence, 
C — > in probability. It follows from (23) that 

C - A{T,vr'B4T, v) = -CCCC*)rHtiO) - Bn{T,v)) + op(l). 
By Theorem 1, (23) and Slutsky's theorem, we obtain that 

V^CC - Mr, v)-^Bn{T, v)) ^ iV(0, (r, v)n{T, v)A-^ (r, v)). 
We now simplify the matrix A(r, v). By some simple calculation, we have 

/'aj2{w) 

(24) s*2{w) = aj2{w)n2 a.ji{w)fi2 

\ aji{w)fi2 ajo{w)fi2, 

Similarly, we obtain that (Sji(w))®^ = diag(aji(t(;)ajj^(7i;), 0). Note that s*q{w) 
ajo{w). By some tedious basic calculation, we have A(r, v) = diag(r~^, Q2M2)- 
Hence, the asymptotic bias of the estimator ^{v) is 6(t, v) = A~^(t, t')B„(r, v) - 
h'^ep^Q{v)fj.2/2 and the asymptotic covariance is 

I](r, v) = A-i (r, v)U{t, v) (A'^ (t, v)f 

= diag(r, '^^2) + A-^UoiA-y. 
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Therefore, we have finished the proof of the asymptotic normahty of the 
maximum local pseudo-partial likelihood function estimator. □ 

From the proof of Theorems 2 and 3, we have the following result. If 
n/i^ 0, then 

(25) sup |(n/i)V2H{|(u)-4oM}-A-Hr,^oWK(^oM,r)l=op(/^-'/')- 

Proof of Theorem 4. By arguments similar to those used in proving 
Lemma 1, it can be shown that 

(26) sup sup n~^\tpnjit,0) -^pnjit,Oo)\ — >0 
te[o,T] ||0-0,)||<||6»-0o|| 

in probability, where 

n 

i>njit,e)=J2liVij G V)Yijit)exp{P^iVij)Zijit)+giVij)}, 
1=1 

equaling {p'^ {■), g{-))'^ . By the definition of Aoj(t), we can show that 
Aoj(t) — Aoj(t) can be represented by a summation of three terms that are 
functionals of ipnj{t^ ^) — i^njit, 6q) and it follows that these terms are negli- 
gible in the sense of probability. Hence, Aoj(t) Aoj{t), uniformly on (0,r]. 
Therefore, we can prove by the standard argument of kernel estimation that 
Aoj(t) Xoj{t), uniformly on (0,r]. □ 

Proof of Theorem 5. By (25), Theorem 2 and an argument similar 
to that of Theorem 3 in [25], we can prove Theorem 5. □ 
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